Ranking functions using document usage statistics

ABSTRACT

Methods of providing a document relevance score to a document on a network are disclosed. Computer readable medium having stored thereon computer-executable instructions for performing a method of providing a document relevance score to a document on a network are also disclosed. Further, computing systems containing at least one application module, wherein the at least one application module comprises application code for performing methods of providing a document relevance score to a document on a network are disclosed.

RELATED APPLICATIONS

This application claims priority to U.S. patent application Ser. No.,filed, and incorporated herein by reference thereto. See also paragraph[0016] below for additional related applications, to which no priorityis claimed.

BACKGROUND

Ranking functions that rank documents according to their relevance to agiven search query are known. Efforts continue in the art to developranking functions that provide better search results for a given searchquery compared to search results generated by search engines using knownranking functions.

SUMMARY

The following summary is included only to introduce some conceptsdiscussed in the Detailed Description below. This summary is notcomprehensive and is not intended to delineate the scope of the claimedsubject matter, which is set forth by the claims presented at the end.

Described herein are, among other things, various technologies fordetermining a document relevance score for a given document on anetwork. The document relevance score is generated via a rankingfunction that comprises one or more query-independent components,wherein at least one query-independent component includes a usageparameter that takes into account actual document usage data maintainedand stored on a web server for one or more documents on the network. Theranking functions may be used by a search engine to rank multipledocuments in order (typically, in descending order) based on thedocument relevance scores of the multiple documents.

Many of the attendant features will be explained below with reference tothe following detailed description considered in connection with theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The present description will be better understood from the followingdetailed description read in light of the accompanying drawings, whereinlike reference numerals are used to designate like parts in theaccompanying description.

FIG. 1 represents an exemplary logic flow diagram showing exemplarysteps in a method of producing ranked search results in response to asearch query inputted by a user;

FIG. 2 is a block diagram of some of the primary components of anexemplary operating environment for implementation of the methods andprocesses disclosed herein;

FIG. 3 represents a logic flow diagram showing exemplary steps in anexemplary method of determining document relevance scores for documentson a network; and

FIG. 4 represents a logic flow diagram showing exemplary steps in amethod of ranking search results generated using a ranking functioncontaining a document usage parameter.

DETAILED DESCRIPTION OVERVIEW

To promote an understanding of the principles of the methods andprocesses disclosed herein, descriptions of specific embodiments followand specific language is used to describe the specific embodiments. Itwill nevertheless be understood that no limitation of the scope of thedisclosed methods and processes is intended by the use of specificlanguage. Alterations, further modifications, and such furtherapplications of the principles of the disclosed methods and processesdiscussed are contemplated as would normally occur to one ordinarilyskilled in the art to which the disclosed methods and processespertains.

Methods of determining a document relevance score for documents on anetwork are disclosed. Each document relevance score is calculated usinga ranking function that desirably contains one or more query-independentcomponents (e.g., a function component that that does not depend on agiven search query or search query term), one or more query-dependentcomponents (e.g., a function component that depends on the specifics ofa given search query or search query term), or a combination thereof.The document relevance scores determined by the ranking function may beused to rank documents within a network space (e.g., a corporateintranet space) according to each document relevance score. An exemplarysearch process in which the disclosed methods may be used is shown asexemplary process 10 in FIG. 1.

FIG. 1 depicts exemplary search process 10, which starts with processstep 80, wherein a user inputs a search query. From step 80, exemplarysearch process 10 proceeds to step 200, wherein a search engine searchesall documents within a network space for one or more terms of the searchquery. From step 200, exemplary search process 10 proceeds to step 300,wherein a ranking function of the search engine ranks the documentswithin the network space based on the relevance score of each document,the document relevance score being based on one or morequery-independent components, one or more query-dependent components, ora combination thereof. From step 300, exemplary search process 10proceeds to step 400, wherein ranked search results are presented to theuser, typically in descending order, identifying documents within thenetwork space that are most relevant to the search query.

As discussed in more detail below, in some exemplary methods ofdetermining a document relevance score, at least one query-independentcomponent of a ranking function used to determine a document relevancescore takes into account “document usage data” or “document usagestatistics” related to actual usage of one or more documents within anetwork space by one or more users. The document usage data and/orstatistics is generated and stored by application code on a web server,which is separate from a given search engine. For example, documentusage data may be maintained by a web site so that each time a userrequests a URL, the server updates a usage counter. The usage countermay maintain document-related data obtained for a given time interval,such as last week, last month, last year, or the lifetime of a givendocument or set of documents. Application code may be used to obtain theusage data from the web site via (i) a special application programminginterface (API), (ii) a web service request, or (iii) by requesting anadministration web page that returns usage data for every URL on the website.

Specific web sites may be used to generate and maintain usage datawithin a network space, as well as store the usage data in a local orremote storage system. Suitable web sites for generating, maintainingand storing usage data of documents within a network space include, butare not limited to, WINDOWS® SHAREPOINT® Services sites.

The disclosed methods of determining a document relevance score mayfurther utilize a ranking function that comprises one or more additionalquery-independent components. Suitable additional query-independentcomponents include, but are not limited to, a query-independentcomponent that takes into account a click distance for each documentwithin a network space as described in U.S. patent application Ser. No.10/955,983 entitled “SYSTEM AND METHOD FOR RANKING SEARCH RESULTS USINGCLICK DISTANCE” filed on Aug. 30, 2004, a query-independent componentthat takes into account a biased click distance for each document withina network space as described in U.S. patent application Ser. No.11/206,286 entitled “RANKING FUNCTIONS USING A BIASED CLICK DISTANCE OFA DOCUMENT ON A NETWORK” filed on Aug. 15, 2005, and a query-independentcomponent that takes into account the URL for each document within anetwork space as described in U.S. patent application Ser. No.10/955,983 entitled “SYSTEM AND METHOD FOR RANKING SEARCH RESULTS USINGCLICK DISTANCE” filed on Aug. 30, 2004. The subject matter of each ofthe above-mentioned U.S. patent applications, which are assigned to theassignee of the present patent application, is hereby incorporated byreference in its entirety.

In yet a further exemplary embodiment, the disclosed methods ofdetermining a document relevance score utilizes a ranking function thatcomprises at least one query-independent component, which includes boththe above-described document usage parameter and one or more of theabove-described additional query-independent components.

The document relevance score may be used to rank documents within anetwork space. For example, a method of ranking documents on a networkmay comprise the steps of determining a document relevance score foreach document on the network using the above-described method; andranking the documents in a desired order (typically, in descendingorder) based on the document relevance scores of each document.

The document relevance score may also be used to rank search results ofa search query. For example, a method of ranking search results of asearch query may comprise the steps of determining a document relevancescore for each document in the search results of a search query usingthe above-described method, and ranking the documents in a desired order(typically, in descending order) based on the document relevance scoresof each document.

Application programs using the methods disclosed herein may be loadedand executed on a variety of computer systems comprising a variety ofhardware components. An exemplary computer system and exemplaryoperating environment for practicing the methods disclosed herein isdescribed below.

Exemplary Operating Environment

FIG. 2 illustrates an example of a suitable computing system environment100 on which the methods disclosed herein may be implemented. Thecomputing system environment 100 is only one example of a suitablecomputing environment and is not intended to suggest any limitation asto the scope of use or functionality of the methods disclosed herein.Neither should the computing environment 100 be interpreted as havingany dependency or requirement relating to any one or combination ofcomponents illustrated in the exemplary operating environment 100.

The methods disclosed herein are operational with numerous other generalpurpose or special purpose computing system environments orconfigurations. Examples of well-known computing systems, environments,and/or configurations that may be suitable for use with the methodsdisclosed herein include, but are not limited to, personal computers,server computers, hand-held or laptop devices, multiprocessor systems,microprocessor-based systems, set top boxes, programmable consumerelectronics, network PCs, minicomputers, mainframe computers,distributed computing environments that include any of the above systemsor devices, and the like.

The methods and processes disclosed herein may be described in thegeneral context of computer-executable instructions, such as programmodules, being executed by a computer. Generally, program modulesinclude routines, programs, objects, components, data structures, etc.that perform particular tasks or implement particular abstract datatypes. The methods and processes disclosed herein may also be practicedin distributed computing environments where tasks are performed byremote processing devices that are linked through a communicationsnetwork. In a distributed computing environment, program modules may belocated in both local and remote computer storage media including memorystorage devices.

With reference to FIG. 2, an exemplary system for implementing themethods and processes disclosed herein includes a general purposecomputing device in the form of a computer 110. Components of computer110 may include, but are not limited to, a processing unit 120, a systemmemory 130, and a system bus 121 that couples various system componentsincluding, but not limited to, system memory 130 to processing unit 120.System bus 121 may be any of several types of bus structures including amemory bus or memory controller, a peripheral bus, and a local bus usingany of a variety of bus architectures. By way of example, and notlimitation, such architectures include Industry Standard Architecture(ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA)bus, Video Electronics Standards Association (VESA) local bus, andPeripheral Component Interconnect (PCI) bus also known as Mezzanine bus.

Computer 110 typically includes a variety of computer readable media.Computer readable media can be any available media that can be accessedby computer 110 and includes both volatile and nonvolatile media,removable and non-removable media. By way of example, and notlimitation, computer readable media may comprise computer storage mediaand communication media. Computer storage media includes volatile andnonvolatile, removable and non-removable media implemented in any methodor technology for storage of information such as computer readableinstructions, data structures, program modules or other data. Computerstorage media includes, but is not limited to, RAM, ROM, EEPROM, flashmemory or other memory technology, CD-ROM, digital versatile disks (DVD)or other optical disk storage, magnetic cassettes, magnetic tape,magnetic disk storage or other magnetic storage devices, or any othermedium, which can be used to store the desired information and which canbe accessed by computer 110. Communication media typically embodiescomputer readable instructions, data structures, program modules orother data in a modulated data signal such as a carrier wave or othertransport mechanism and includes any information delivery media. Theterm “modulated data signal” means a signal that has one or more of itscharacteristics set or changed in such a manner as to encode informationin the signal. By way of example, and not limitation, communicationmedia includes wired media such as a wired network or direct-wiredconnection, and wireless media such as acoustic, RF, infrared and otherwireless media. Combinations of the any of the above should also beincluded within the scope of computer readable media as used herein.

System memory 130 includes computer storage media in the form ofvolatile and/or nonvolatile memory such as read only memory (ROM) 131and random access memory (RAM) 132. A basic input/output system 133(BIOS) containing the basic routines that help to transfer informationbetween elements within computer 110, such as during start-up, istypically stored in ROM 131. RAM 132 typically contains data and/orprogram modules that are immediately accessible to and/or presentlybeing operated on by processing unit 120. By way of example, and notlimitation, FIG. 2 illustrates operating system 134, applicationprograms 135, other program modules 136, and program data 137.

Computer 110 may also include other removable/non-removable,volatile/nonvolatile computer storage media. By way of example only,FIG. 2 illustrates a hard disk drive 140 that reads from or writes tonon-removable, nonvolatile magnetic media, a magnetic disk drive 151that reads from or writes to a removable, nonvolatile magnetic disk 152,and an optical disk drive 155 that reads from or writes to a removable,nonvolatile optical disk 156 such as a CD ROM or other optical media.Other removable/non-removable, volatile/nonvolatile computer storagemedia that can be used in the exemplary operating environment include,but are not limited to, magnetic tape cassettes, flash memory cards,digital versatile disks, digital video tape, solid state RAM, solidstate ROM, and the like. Hard disk drive 141 is typically connected tosystem bus 121 through a non-removable memory interface such asinterface 140, and magnetic disk drive 151 and optical disk drive 155are typically connected to system bus 121 by a removable memoryinterface, such as interface 150.

The drives and their associated computer storage media discussed aboveand illustrated in FIG. 2 provide storage of computer readableinstructions, data structures, program modules and other data forcomputer 110. In FIG. 2, for example, hard disk drive 141 is illustratedas storing operating system 144, application programs 145, other programmodules 146, and program data 147. Note that these components can eitherbe the same as or different from operating system 134, applicationprograms 135, other program modules 136, and program data 137. Operatingsystem 144, application programs 145, other program modules 146, andprogram data 147 are given different numbers here to illustrate that, ata minimum, they are different copies.

A user may enter commands and information into computer 110 throughinput devices such as a keyboard 162 and pointing device 161, commonlyreferred to as a mouse, trackball or touch pad. Other input devices (notshown) may include a microphone, joystick, game pad, satellite dish,scanner, or the like. These and other input devices are often connectedto processing unit 120 through a user input interface 160 that iscoupled to system bus 121, but may be connected by other interface andbus structures, such as a parallel port, game port or a universal serialbus (USB). A monitor 191 or other type of display device is alsoconnected to system bus 121 via an interface, such as a video interface190. In addition to monitor 191, computer 110 may also include otherperipheral output devices such as speakers 197 and printer 196, whichmay be connected through an output peripheral interface 195.

Computer 110 may operate in a networked environment using logicalconnections to one or more remote computers, such as a remote computer180. Remote computer 180 may be a personal computer, a server, a router,a network PC, a peer device or other common network node, and typicallyincludes many or all of the elements described above relative tocomputer 110, although only a memory storage device 181 has beenillustrated in FIG. 2. The logical connections depicted in FIG. 2include a local area network (LAN) 171 and a wide area network (WAN)173, but may also include other networks. Such networking environmentsare commonplace in offices, enterprise-wide computer networks, intranetsand the Internet.

When used in a LAN networking environment, computer 110 is connected toLAN 171 through a network interface or adapter 170. When used in a WANnetworking environment, computer 110 typically includes a modem 172 orother means for establishing communications over WAN 173, such as theInternet. Modem 172, which may be internal or external, may be connectedto system bus 121 via user input interface 160, or other appropriatemechanism. In a networked environment, program modules depicted relativeto computer 110, or portions thereof, may be stored in the remote memorystorage device. By way of example, and not limitation, FIG. 2illustrates remote application programs 185 as residing on memory device181. It will be appreciated that the network connections shown areexemplary and other means of establishing a communications link betweenthe computers may be used.

Methods and processes disclosed herein may be implemented using one ormore application programs including, but not limited to, a server systemsoftware application (e.g., WINDOWS SERVER SYSTEM™ softwareapplication), a search ranking application, and an application forgenerating, maintaining and storing usage data of documents within anetwork space (e.g., WINDOWS® SHAREPOINT® Services application), any oneof which could be one of numerous application programs designated asapplication programs 135, application programs 145 and remoteapplication programs 185 in exemplary system 100.

As mentioned above, those skilled in the art will appreciate that thedisclosed methods of generating a document relevance score for a givendocument may be implemented in other computer system configurations,including hand-held devices, multiprocessor systems,microprocessor-based or programmable consumer electronics, networkedpersonal computers, minicomputers, mainframe computers, and the like.The disclosed methods of generating a document relevance score for agiven document may also be practiced in distributed computingenvironments, where tasks are performed by remote processing devicesthat are linked through a communications network. In a distributedcomputing environment, program modules may be located in both local andremote memory storage devices.

Implementation of Exemplary Embodiments

As discussed above, methods of determining a document relevance scorefor a document on a network are provided. The disclosed methods may ranka document on a network utilizing a ranking function that takes intoaccount a document usage value of each document on the network.

The disclosed methods of determining a document relevance score for adocument on a network may comprise a number of steps. In one exemplaryembodiment, the method of determining a document relevance score for adocument on a network comprises the steps of assigning an actual usagevalue (U_(A)) to one or more documents on a network comprising Ndocuments, wherein the actual usage value (U_(A)) is based on actualusage data maintained and stored on a server; if less than N documentsare assigned an actual usage value (U_(A)), assigning a default usagevalue (U_(D)) to the documents that do not have actual usage dataassociated therewith; and using the usage value (i.e., U_(A) or U_(D))for each document to determine the document relevance score of a givendocument on the network.

As used herein, the term “actual usage data” represents one or moretypes of data associated with the “usage” of the document by one or moreusers. Types of actual usage data for a given document or set ofdocuments may include, but are not limited to, the number of documentviews by all users within a given period of time, the average number ofdocument views per user within a given period of time, total time spenton a particular document within a given period of time, average timespent on a particular document within a given period of time, etc. Thegiven period of time may be, for example, last week, last month, lastyear, the lifetime of the document, or any other desired period of time.

The steps of generating, maintaining and storing document usage data orstatistics for documents within a network space may be performed byapplication code commonly found on computing systems. Document usagedata is generated, maintained and stored independently of a given searchquery or search engine, and is typically generated, maintained andstored by application code on the server that maintains the document (orpage) and makes the document (or page) available to a user. Suitableapplication programs for generating, maintaining and storing documentusage data or statistics include, but are not limited to, WINDOWS®SHAREPOINT® Services and other similar application programs.

Document usage data stored and maintained on these service sites, aswell as other web sites performing a similar function, may be accessedusing application code as discussed above. For example, document usagedata may be accessed from a given web site (e.g., a WINDOWS® SHAREPOINT®Services site) via (i) a special application programming interface(API), (ii) a web service request, or (iii) by requesting anadministration web page that returns usage data for every URL on the website.

The disclosed methods of determining a document relevance score for adocument on a network may comprise a number of additional stepsincluding, but not limited to, monitoring one or more documents within anetwork space for actual document usage; storing actual document usagedata for one or more documents in a local or remote data storage file;calculating an actual usage value (U_(A)) for a document based on actualusage data for the document or a folder containing the document; storingactual usage values (U_(A)) for one or more documents in a local orremote data storage file; requesting stored document usage data oractual usage values (U_(A)) from a local or remote data storage file(e.g., a request for such data from a search engine after a specificsearch query by a user); retrieving actual document usage data or anactual usage value (U_(A)) for one or more documents from a local orremote data storage file; and optionally, merging a document usage value(i.e., actual or default) with one or more additional documentproperties to determine a document relevance score for a document.

FIG. 3 represents a logic flow diagram showing exemplary steps in anexemplary method of providing actual or default usage values fordocuments on a network followed by an optional downgrading/upgradingprocedure by a system administrator. As shown in FIG. 3, exemplarymethod 401 starts at block 402 and proceeds to step 403. In step 403, afirst document on the network is crawled for actual usage data.

The step of crawling a first document for actual usage data (step 403)may be performed using a crawler application capable of determiningwhether the first document has any actual usage data associatedtherewith, and if the first document has actual usage data associatedtherewith, retrieving the actual usage data. Suitable crawlerapplications for use in the disclosed methods of providing actual ordefault usage values for documents on a network include, but are notlimited to, crawler applications described in U.S. Pat. Nos. 6,463,455and 6,631,369, the subject matter of both of which is herebyincorporated in its entirety by reference.

As discussed above, the actual usage data may be obtained from one ormore files that store actual usage data for one or more documents on anetwork. The actual usage data may be stored, along with the document,as a document component, or may be stored in a data storage fileseparate from the actual document. Suitable remote storage systemsinclude, but are not limited to, WINDOWS® SHAREPOINT® Services (WSS)products commercially available from Microsoft Corporation (Redmond,Wash.), as well as any other similar remote storage system. For example,the WSS remote storage system records actual usage data including, forexample, the number of requests to every document on a given networkacross all users, and produces statistics of number of clicks perdocument during the last week, the last month, the last year, or theoverall lifetime of the document, or any other period of time. Further,as noted above, it should be understood that the methods disclosedherein are not limited to a WSS remote storage system, but may utilize aWSS remote storage system or any other similar document data system inthe disclosed methods.

Once the document is crawled, exemplary method 401 proceeds to decisionblock 404. At decision block 404, a determination is made by applicationcode as to whether the document has actual usage data associatedtherewith. If a decision is made that the document has actual usage dataassociated therewith, exemplary method 401 proceeds to step 405, whereina usage value based on actual usage (U_(A)) is assigned to the document.The actual usage value (U_(A)) may be determined using one or morecomponents of the actual usage data associated with the document. Forexample, in some embodiments, the actual usage value (U_(A)) assigned tothe document may be related only to the number of users viewing thedocument. In other embodiments, the actual usage value (U_(A)) assignedto the document may be related to the number of document views by allusers within a given period of time, the average number of documentviews per user within a given period of time, total time spent on aparticular document within a given period of time, average time spent ona particular document within a given period of time, or a combination ofany of the above criteria, wherein the given period of time compriseslast week, last month, last year, the lifetime of the document, or anyother desired period of time.

In some cases, the actual usage data associated with a given documentsuggests that the document was not used or viewed during a given timeperiod. In such a case, the document could be assigned a usage value(U_(A)) equal to zero to indicate no usage during the time period;however, typically, usage values (U_(A)) based on actual use or noactual use are assigned a number other than zero.

Further, in some cases, actual usage data may be associated with a setof documents as oppose to individual documents. For example, a foldermay contain a set of documents, and an associated server may only trackusage data related to accessing (i.e., use of) the folder, and not theindividual documents within the folder. In this embodiment, if there isactual usage data associated with a folder, a usage value (U_(A)) may beprovided for each document within the folder based on the actual usagedata of the folder. Typically, each usage value (U_(A)) will be the samefor each document within the folder; however, different usage values(U_(A)) may be assigned to different documents within a folder if sodesired.

From step 405, exemplary method 401 proceeds to decision block 406described below.

Returning to decision block 404, if a decision is made that the documentdoes not have actual usage data associated therewith, exemplary method401 proceeds to step 407, wherein a default usage value (U_(D)) isassigned to the document. For example, a default usage value (U_(D)) maybe assigned to a document that is part of a web site that does notmaintain document usage data. The default usage value (U_(D)) assignedto the document may be used to provide an initial importance to thedocument relative to documents having actual usage data. For example, ifa higher usage value for a given document indicates relative importanceof the document within the network, assigning a lower default usagevalue (U_(D)) to the document downgrades the importance of the documentrelative to other documents on the network.

In one exemplary embodiment wherein a higher usage value for a givendocument indicates relative importance of the document within thenetwork, the default usage value (U_(D)) assigned to the document may berelative to actual usage values (U_(A)) assigned to other documents onthe network. For example, in order to lower the relative importance ofthe document, a default usage value (U_(D)) may be assigned to thedocument, wherein the default usage value (U_(D)) is less than anyactual usage value (U_(A)) assigned to other documents on the network asdescribed above. If it is desired to increase the relative importance ofthe document, a default usage value (U_(D)) may be assigned to thedocument, wherein the default usage value (U_(D)) is greater than anyactual usage value (U_(A)) assigned to other documents on the network orgreater than some of the actual usage values (U_(A)) assigned to some ofthe other documents on the network.

In other embodiments, a default usage value (U_(D)) may be assigned to adocument without actual usage data so that the document is given anaverage relative importance compared to documents having an assignedactual usage value (U_(A)). For example, in this embodiment, defaultusage values (U_(D)) for documents without actual usage data may rangefrom a minimum assigned actual usage value (U_(Amin)) to a maximumassigned actual usage value (U_(Amax)), or be within a specific rangebetween the minimum assigned actual usage value (U_(Amin)) and themaximum assigned actual usage value (U_(Amax)). In this embodiment,documents without actual usage data are provided with an averagerelative importance, suggesting medium usage, compared to documents thathave actual usage data associated therewith.

From step 407, exemplary method 401 proceeds to decision block 406. Atdecision block 406, a determination is made by application code as towhether all of the documents on a network have an actual (U_(A)) ordefault (U_(D)) usage value. If a decision is made that all of thedocuments on a network do not have an actual (U_(A)) or default (U_(D))usage value, exemplary method 401 proceeds to step 408, wherein the nextdocument is crawled for actual usage data. From step 408, exemplarymethod 401 returns to decision block 404 and proceeds as discussedabove.

Returning to decision block 406, if a determination is made byapplication code that all documents on the network have an actual(U_(A)) or default (U_(D)) usage value, exemplary method 401 proceeds todecision block 409. At decision block 409, a determination is made by asystem administrator whether to downgrade any actual (U_(A)) or default(U_(D)) usage values in order to more closely represent the importanceof a given document within a network space. If a decision is made todowngrade one or more actual (U_(A)) or default (U_(D)) usage values inorder to more closely represent the importance of one or more documentswithin a network space, exemplary method 401 proceeds to step 410,wherein the actual (U_(A)) or default (U_(D)) usage value of one or moredocuments (or URLs) are adjusted either negatively or positively. Fromstep 410, exemplary method 401 proceeds to step 411 described below.

Returning to decision block 409, if a decision is made not to downgrade(or upgrade) one or more actual (U_(A)) or default (U_(D)) usage values,exemplary method 401 proceeds directly to step 411. In step 411, theactual (U_(A)) and default (U_(D)) usage values are utilized in aranking function to determine an overall document relevance score foreach document within a network space. From step 411, exemplary method401 proceeds to end block 412.

Once all actual (U_(A)) and default (U_(D)) usage values have beendetermined and optionally downgraded (or optionally upgraded), if sodesired, the actual (U_(A)) or default (U_(D)) usage value for eachdocument may be used as a parameter in a ranking function to provide adocument relevance score for each document. Such a document relevancescore may be used to rank search results of a search query. An exemplarymethod of ranking search results generated using a ranking functioncontaining a document usage value parameter is shown in FIG. 4.

FIG. 4 provides a logic flow diagram showing exemplary steps inexemplary method 20, wherein exemplary method 20 comprises a method ofranking search results generating using a ranking function containing ausage value parameter. As shown in FIG. 4, exemplary method 20 starts atblock 201 and proceeds to step 202. In step 202, a user requests asearch by inputting a search query. Prior to step 202, actual or defaultusage values for each of the documents on the network have previouslybeen calculated. From step 202, exemplary method 20 proceeds to step203.

In step 203, the actual or default usage value for each document on anetwork is merged with any other document statistics (e.g., otherquery-independent statistics) for each document stored in the index.Merging the actual or default usage values with other documentstatistics allows for a faster query response time since all theinformation related to ranking is clustered together. Accordingly, eachdocument listed in the index has an associated actual or default usagevalue after the merge. Once the merge is complete, exemplary method 20proceeds to step 204.

In step 204, query-independent document statistics for a given document,including a usage parameter, are provided as a component of a rankingfunction. Query-dependent data is also provided for the given document,typically as a separate component of the ranking function. Thequery-dependent data or content-related portion of the ranking functiondepends on the actual search terms and the content of the givendocument.

In one embodiment, the ranking function comprises at least onequery-independent (QID) component comprises a usage parameter. In oneembodiment, the query-independent (QID) component may be represented bythe following equation:

$\begin{matrix}{{{QID}({doc})} = {w_{u}\frac{k_{u}U}{k_{u} + U}}} & (1)\end{matrix}$

wherein:U represents an actual usage value or a default usage value; andw_(u) and k_(u) represent tuning parameters for the usage value. In afurther embodiment, the query-independent (QID) component may berepresented by the following equation:

QID(doc)=w _(u) U+k _(u)   (2)

wherein:U represents an actual usage value or a default usage value; andw_(u) and k_(u) represent tuning parameters for the usage value. In yeta further embodiment, the query-independent (QID) component may berepresented by the following equation:

QID(doc)=w _(u)[1+exp(−k _(u) U−B]+C   (3)

wherein: U represents an actual usage value or a default usage value;and w_(u), k_(u), B and C represent tuning parameters (i.e., scalarconstants) for the usage value.

In a further embodiment, the ranking function comprises a sum of theabove-described query-independent (QID) component and at least onequery-dependent (QD) component, such as

Score=QD(doc,query)+QID(doc).

The QD component can be any document scoring function. In oneembodiment, the QD component corresponds to a field weighted scoringfunction described in U.S. patent application Ser. No. 10/804,326entitled “FIELD WEIGHTING IN TEXT DOCUMENT SEARCHING,” filed on Mar. 18,2004, the subject matter of which is hereby incorporated in its entiretyby reference. As provided in U.S. patent application Ser. No.10/804,326, one equation that may be used as a representation of thefield weighted scoring function is as follows:

${{QD}\left( {{doc},{query}} \right)} = {\sum{\frac{{wtf}^{\prime}\left( {k_{1} + 1} \right)}{k_{1} + {wtf}^{\prime}} \times {\log \left( \frac{N}{n} \right)}}}$

wherein: wtf′ represents a weighted term frequency or sum of termfrequencies of given terms in the search query multiplied by weightsacross all fields (e.g., the title, the body, etc. of the document) andnormalized according to the length of each field and the correspondingaverage length, N represents a number of documents on the network, nrepresents a number of documents containing a query term, and k₁ is atunable constant.

The above terms and equation are further described in detail in U.S.patent application Ser. No. 10/804,326, the subject matter of which ishereby incorporated in its entirety by reference.

In some embodiments, the ranking function may further comprise a QIDcomponent that takes into account (i) a click distance value asdetermined by the methods disclosed in U.S. patent application Ser. No.10/955,983 entitled “SYSTEM AND METHOD FOR RANKING SEARCH RESULTS USINGCLICK DISTANCE” filed on Aug. 30, 2004, (ii) a biased click distancevalue as determined by the methods disclosed in U.S. patent applicationSer. No. 11/206,286 entitled “RANKING FUNCTIONS USING A BIASED CLICKDISTANCE OF A DOCUMENT ON A NETWORK” filed on Aug. 15, 2005, the subjectmatter of both of which is incorporated herein by reference in itsentirety, (iii) a URL depth of a document, or (iv) a combination of (i)or (ii) and (iii). For example, this optional additional QID componentmay comprise a function as follows:

${{QID}({doc})} = {w_{cd}\frac{k_{cd}}{k_{cd} + \frac{{b_{cd}\frac{CD}{k_{ew}}} + {b_{ud}{UD}}}{b_{cd} + b_{ud}}}}$

wherein: W_(cd) represents a weight of a query-independent componentsuch as a component containing a click distance or biased click distanceparameter, b_(cd) represents a weight of a click distance or biasedclick distance relative to the URL depth, b_(ud) represents a weight ofa URL depth, CD represents a computed or assigned click distance orbiased click distance for a document, k_(ew) represents a tuningconstant that is determined by optimizing the precision of the rankingfunction, similar to other tuning parameters (i.e., k_(ew) may representthe edge weight value when all edges have the same edge weight value, ork_(ew) may represent the average or mean edge value when edge weightvalues differ from one another), UD represents a URL depth, and k_(cd)is the click distance saturation constant.

The weighted terms (W_(cd), b_(cd), and b_(ud)) assist in defining theimportance of each of their related terms (i.e., the componentcontaining a click distance or biased click distance parameter, theclick distance or biased click distance value for a given document, andthe URL depth of the given document respectively) and ultimately theoutcome of the scoring function.

The URL depth (UD) is an optional addition to the above-referencedquery-independent component to smooth the effect that the click distanceor biased click distance value may have on the scoring function. Forexample, in some cases, a document that is not very important (i.e., hasa large URL depth) may have a short click distance or biased clickdistance value. The URL depth is represented by the number of slashes ina document's URL. For example, www.example.com\d1\d2\d3\d4.htm includesfour slashes and would therefore have a URL depth of 4. This documenthowever, may have a link directly from the main page www.example.comgiving it a relatively low click distance or biased click distancevalue. Including the URL depth term in the above-referenced function andweighting the URL depth term against the click distance or biased clickdistance value compensates for a relatively high click distance orbiased click distance value to more accurately reflect the document'simportance within the network. Depending on the network, a URL depth of3 or more may be considered a deep link.

In one embodiment, the ranking function used to determine a documentrelevance score for a given document comprises a function as follows:

${Score} = {{\sum{\frac{{wtf}^{\prime}\left( {k_{1} + 1} \right)}{k_{1} + {wtf}^{\prime}} \times {\log \left( \frac{N}{n} \right)}}} + {w_{cd}\frac{k_{cd}}{k_{cd} + \frac{{b_{cd}\frac{CD}{k_{ew}}} + {b_{ud}{UD}}}{b_{cd} + b_{ud}}}} + {w_{u}\frac{k_{u}U}{k_{u} + U}}}$

wherein the terms are as described above.

In other embodiments, the URL depth may be removed from the rankingfunction or other components may be added to the ranking function toimprove the accuracy of the query-dependent component, thequery-independent component, or both. Furthermore, the above-describedquery-independent component containing a usage parameter may beincorporated into other ranking functions (not shown) to improve rankingof search results.

Once document statistics for a given document are provided to a rankingfunction in step 204, exemplary method 20 proceeds to step 205. In step205, a document relevance score is determined for a given document,stored in memory, and associated with the given document. From step 205,exemplary method 20 proceeds to decision block 206.

At decision block 206, a determination is made by application codewhether a document relevance score has been calculated for each documentwithin a network. If a determination is made that a document relevancescore has not been calculated for each document within a network,exemplary method 20 returns to step 204 and continues as describedabove. If a determination is made that a document relevance score hasbeen calculated for each document within a network, exemplary method 20proceeds to step 207.

In step 207, the search results of the query comprising numerousdocuments are ranked according to their associated document relevancescores. The resulting document relevance scores take into account theactual or default usage value of each of the documents within thenetwork. Once the search results are ranked, exemplary method 20proceeds to step 208 where ranked results are displayed to a user. Fromstep 208, exemplary method 20 proceeds to step 209 where highest rankedresults are selected and viewed by the user. From step 209, exemplarymethod 20 proceeds to step 210 where exemplary method 20 ends.

In addition to the above-described methods of generating a documentrelevance score for documents within a network and using documentrelevance scores to rank search results of a search query, computerreadable medium having stored thereon computer-executable instructionsfor performing the above-described methods are also disclosed herein.

Computing systems are also disclosed herein. An exemplary computingsystem contains at least one application module usable on the computingsystem, wherein the at least one application module comprisesapplication code loaded thereon, wherein the application code performs amethod of generating a document relevance score for documents within anetwork. The application code may be loaded onto the computing systemusing any of the above-described computer readable medium having thereoncomputer-executable instructions for generating a document relevancescore for documents within a network and using document relevance scoresto rank search results of a search query as described above.

While the specification has been described in detail with respect tospecific embodiments thereof, it will be appreciated that those skilledin the art, upon attaining an understanding of the foregoing, mayreadily conceive of alterations to, variations of, and equivalents tothese embodiments. Accordingly, the scope of the disclosed methods,computer readable medium, and computing systems should be assessed asthat of the appended claims and any equivalents thereto.

1. A method of determining a document relevance score for a document ona network, said method comprising the steps of: assigning an actualusage value (U_(A)) to one or more documents on a network comprising Ndocuments, wherein the actual usage value (U_(A)) is based on actualusage data maintained and stored on a server; if less than N documentsare assigned an actual usage value (U_(A)), assigning a default usagevalue (U_(D)) to the documents that do not have actual usage dataassociated therewith; and using the usage value for each document todetermine the document relevance score of a given document on thenetwork.
 2. The method of claim 1, further comprising the step of:retrieving actual usage data or an actual usage value (U_(A)) for adocument from a data storage file on the server.
 3. The method of claim2, further comprising the step of: storing actual usage data or anactual usage value (U_(A)) for a document in a data storage file.
 4. Themethod of claim 2, wherein the document relevance score for eachdocument on the network is generated using a formula:${Score} = {{\sum{\frac{{wtf}^{\prime}\left( {k_{1} + 1} \right)}{k_{1} + {wtf}^{\prime}} \times {\log \left( \frac{N}{n} \right)}}} + {w_{cd}\frac{k_{cd}}{k_{cd} + \frac{{b_{cd}\frac{CD}{k_{ew}}} + {b_{ud}{UD}}}{b_{cd} + b_{ud}}}} + {w_{u}\frac{k_{u}U}{k_{u} + U}}}$wherein: wtf′ represents a weighted term frequency, N represents anumber of documents on the network, n represents a number of documentscontaining a query term, w_(cd) represents a weight of aquery-independent component, b_(cd) represents a weight of a clickdistance, b_(ud) represents a weight of a URL depth, CD represents acomputed click distance or assigned biased click distance for adocument, k_(ew) represents a tuning constant related to edge weights,UD represents a URL depth, U represents an actual usage value or adefault usage value, w_(u) and k_(u) represent tuning parameters for theusage value, and k_(cd) and k₁ are constants.
 5. A method of rankingdocuments on a network, said method comprising the steps of: determininga document relevance score for each document on the network using themethod of claim 12; and ranking the documents in descending order basedon the document relevance scores of each document.
 6. A method ofranking search results of a search query, said method comprising thesteps of: determining a document relevance score for each document inthe search results of a search query using the method of claim 12; andranking the documents in descending order based on the documentrelevance scores of each document.
 7. A computing system containing atleast one application module usable on the computing system, wherein theat least one application module comprises application code forperforming a method of determining a document relevance score for adocument on a network, said method comprising the steps of: assigning anactual usage value (U_(A)) to one or more documents on a networkcomprising N documents, wherein the actual usage value (U_(A)) is basedon actual usage data maintained and stored on a server; if less than Ndocuments are assigned an actual usage value (U_(A)), assigning adefault usage value (U_(D)) to the documents that do not have actualusage data associated therewith; and using the usage value for eachdocument to determine the document relevance score of a given documenton the network.
 8. The computing system of claim 7, wherein the actualusage value is dependent on one or more usage-related properties of adocument or a folder containing a set of documents, said one or moreusage-related properties comprising a total number of document or folderviews by users within a given period of time, an average number ofdocument or folder views per user within a given period of time, a totaltime spent on a particular document or folder within a given period oftime, an average time spent on a particular document or folder within agiven period of time, wherein the given period of time comprises lastweek, last month, last year, a lifetime of the document or folder, orany other period of time.
 9. A computer readable medium having storedthereon computer-executable instructions for performing a process forranking documents on a network, said process comprising: for eachdocument in a plurality of the documents, determining if there is anactual-usage metric for the document, the actual-usage having beengenerated by user interactions with the document and representing adegree of actual user interaction with the document, and if there is notan actual-usage metric for the document, assigning to the document adefault actual-usage metric value that is based on the actual-usagemetrics of the documents generated by user interactions with thedocuments; storing the actual-usage metrics of the documents for use inranking documents identified as satisfying a query of the documents;receiving a query entered by a user, the query including a query string;searching the documents to identify documents that match the querystring; ranking, relative to each other, the identified documents thatmatch the query string based on the stored respective actual-usagemetrics of the identified documents; and returning to the user indiciaof the identified documents and their relative rankings based on thestored actual-usage metrics.